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ON THE TOTAL VARIATION DISTANCE BETWEEN THE BINOMIAL 
RANDOM GRAPH AND THE RANDOM INTERSECTION GRAPH 

JEONG HAN KIM, SANG JUNE LEE, AND JOOHAN NA 


Abstract. When each vertex is assigned a set, the intersection graph generated by the sets is the graph 
in which two distinct vertices are joined by an edge if and only if their assigned sets have a nonempty 
intersection. An interval graph is an intersection graph generated by intervals in the real line. A 
chordal graph can be considered as an intersection graph generated by subtrees of a tree. In 1999, 
Karoiiski, Scheinerman and Singer-Cohen [Combin Probab Comput 8 (1999), 131-159] introduced a 
random intersection graph by taking randomly assigned sets. The random intersection graph G(n, m;p) 
has n vertices and sets assigned to the vertices are chosen to be i.i.d. random subsets of a fixed set M 
of size m where each element of M belongs to each random subset with probability p, independently of 
all other elements in M. Fill, Scheinerman and Singer-Cohen [Random Struct Algorithms 16 (2000), 
156-176] showed that the total variation distance between the random graph G(n, m;p) and the Erdos- 
Renyi graph G(n,p) tends to 0 for any 0 < p = p{n) < 1 if m = n“, q > 6, where p is chosen so that 
the expected numbers of edges in the two graphs are the same. In this paper, it is proved that the 
total variation distance still tends to 0 for any 0 < p = p(n) < 1 whenever n}. 


1. Introduction 

The intersection graph on V n} generated by a collection {Li,..., of sets is the graph 

on V in which two distinct vertices i and j are adjacent if and only if their corresponding sets Lj and 
Lj have a nonempty intersection. In 1945, Szpilrajn-Marczewski [29] observed that every graph may 
be represented as an intersection graph. Later, Erdos, Goodman and Posa [T2| showed that every 
graph with re vertices can be represented as an intersection graph generated by subsets of a set of re^/4 
elements. An interval graph is an intersection graph generated by intervals in the real line. A chordal 
graph turned out to be an intersection graph generated by subtrees of a tree [Hj . In general, a class of 
graphs is called an intersection class of a family iF of sets if each graph in the class is an intersection 
graph generated by sets in iF. Scheinerman m found a necessary and sufficient condition for a class 
of graphs to be an intersection class of a family iF of sets. Intersection graphs have been applied to 
phylogeny problems in biology seriation problems in psychology [TS], and contingency tables in 
statistics m, etc. For more details, see |24| . 

In 1999, Karohski, Scheinerman and Singer-Cohen [2U| introduced the random intersection graph, 
which is the intersection graph generated by independent and identically distributed (i.i.d.) random 
subsets Li,...,Ln of M = rre}. Fill, Scheinerman and Singer-Cohen [T3| considered conditions 

under which the random intersection graph is essentially the binomial random graph (that is, the 
Erdos-Renyi random graph with independently chosen edges) with the same expected number of 
edges. Let G{n,m;p) denote the random intersection graph generated by i.i.d. random subsets 
Li,..., Ln whose distributions are binomial with parameters {m,p), i.e., for a subset A of M, Pr[Li = 
A\ = pl"^l(l — Fill, Scheinerman and Singer-Cohen were interested in how close G{n,m',p) 
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is to G{n,p) in terms of total variation distance, where p is chosen so that the expected numbers of 
edges in the two graphs are the same, i.e., 


p-.= i-ii-p‘^r 


The total variation distance between two (graph-valued) random variables X and Y is defined by 

TV(X,y) = i^|Pr [X = G]-Pr[y = G]|, 

G 

where the sum is taken over all possible values of X and Y. 


Theorem 1.1 f [T3l Theorem 10]). Let a > G be a constant and m = n°‘. Then for any 0 < p = 
p{n) < 1, 

Tx(^G{n,m]p),G{n,p)j = o(l). 

For 3 < a < 6, Rybarczyk [26] proved a weaker result. Namely, for any monotone property A, 
Pr[G(n, m;p) G A] and Pr[G(n,p) G A] are essentially the same. The exact statements of the theorems 
there are rather complicated. 

A random intersection graph has received a lot of attention due to a great diversity of applications 
in areas such as epidemics |9|, circuit design |20] . network user profiling |23j and analysis of complex 
networks OlllTKlol. For more information, we refer the reader to the survey papers [51E1128]. For 
instance, G{n,m‘,p) is applicable for gate matrix circuit design, which is related to the optimization 
problem of finding a permutation of the order of gate lines that minimizes the number of horizontal 
tracks required to lay out the circuit. The problem is NP-hard in general, but it is solvable in 0(n) time 
when G is an interval graph [T6|. Karohski, Scheinerman and Singer-Cohen m studied conditions for 
which G{n,m]p) is an interval graph with high probability. 

When Lfs are uniformly distributed in the class of subsets of M of the same size, the random 
intersection graph generated by the Lfs is called a uniform random intersection graph. An application 
to security of wireless sensor networks |2l[8l[IIl[25] is one of the main motivations for studying the 
uniform random intersection graph. The random intersection graph can be generalized in the way that 
the vertices i and j are adjacent if Li and Lj have at least s > 1 common elements. The generalization 
is applicable for cluster analysis mmm- 

The random intersection graph G{n, m;p) may be defined using an nxm random matrix R{n, m;p) 
whose rows are indexed by i G F and columns are indexed by a G M. Each entry of the matrix is 1 
or 0 with probability of p and 1 — p, respectively, independently of all other entries. The row vector 
indexed by i G F corresponds to the subset Li of M. On the other hand, the column vector indexed 
by a G M corresponds to the set Fi of all vertices i G F with a G Li. The graph G{n,m‘,p) may be 
alternatively constructed by taking the edge set to be the union of edge sets of the complete graphs 
on Fa for all a G M. 

The main difference between G{n, m]p) and G{n,p) are the complete graphs induced by the column 
vectors with three or more I’s. In particular, the triangles formed by the columns with exactly three 
I’s play an important role. Those triangles are to be called artifact triangles. Roughly speaking, 
if mp'^ is large, then p is close to 1 so that both of G{n,m;p) and G{n,p) are almost the complete 
graphs with high probability. On the other hand, if mp^ is small, then the expected number of artifact 
triangles is {f^mp^{l — {mp‘^)^/‘^), which goes to 0, provided m ^ n®. Theorem 11.11 

was proved based upon this observation. 

In this paper, we will show that the total variation distance is still small enough even if there 
are some artifact triangles. It is actually small as long as the expected number of pairs of distinct 
artifact triangles with a common edge is small. If the expected number is not small, the total variation 
distance may be small when both of G(n,m;p) and G{n,p) are almost the complete graphs with high 
probability. Based on these two facts, we infer that if m ^ then the total variation distance is 
always small for any p: It turns out that the expected number is 0{n'^m?p^). To have the total 




ON THE TOTAL VARIATION DISTANCE BETWEEN THE BINOMIAL RANDOM GRAPH AND THE RANDOM INTERSECTION GRA: 


variation distance small for all p, it is required that mp^ is large when n^m^p^ = is not 

small, which holds if m ^ n^. 

Theorem 1.2. For and 0 < p = p{n) < 1, we have that 

'(G{n,m;p),G{n,p)'^ = o(l)- 


TV 


In the next section, we give the outline of the proof of Theorem 11.21 The proof will be divided into 
four parts, which will be proved in Sections [2ll5l 


2. Preliminaries and Outline of proof of Theorem 11.21 


Ifp > 


31ogn 

m 


1/2 


both of G(n, m;p) and G{n,p) are the complete graphs with probability 1 — 0 (^). 


Indeed, for each edge e. 


Pr[e 0 G(n, m; p)] = (1 - p^r < 




and hence G{n, m;p) is the complete graph with probability 1 — 0 (^). Since the expected numbers of 
edges in G{n, m;p) and G{n,p) are the same, G{n,p) is the complete graph with probability 1-0 (^) 
as well. Therefore, 

1- 


TV 


= O 


In the rest of the paper, we assume that 

0 <p < 


(^G{n,m-,p),G{n,p)^ 

3 log n\ 1/2 


n 


m 


- 


As described in the introduction, the random intersection graph G{n,m;p) may be constructed using 
an n X m random matrix R(n,m;p) whose rows are indexed by u E V and columns are indexed by 
a E M. For fixed o E M, the probability of 14 := {u E V : a E L^} being a fixed A:-subset of 
V is p^{l — p)^~^ for integer k > 2. Hence 14 is the fe-subset for some a G M with probability 
1 — (1 — p^{l — p)‘^~^)"^, which will be approximated by 

p^ := 1 — 

Also, G{n,m]p) will be approximated by another random graph G{n, {p,,)), which is to be defined 
below. 

For 0 < p* < 1, let T-Lk{n,p*) be a random collection of fc-subsets of V to which each fe-subset belongs 
with probability p*, independently of all other /c-subsets. For H CV, let K{H) be the complete graph 
on H. Then, for a collection FL of subsets of V, let K{H) denote the graph on V whose edge set is the 
union of edge sets of the complete graphs K[H) on H gFL. Notice that K{'H 2 {n,p*)) is the binomial 
random graph G{n,p*). For p. defined above, let G{n, (p,)) be the random graph on V whose edge 
set is the union of edge sets of K{'H 2 {n,p^)), K{'H 3 {n,p^)),..., K{'Hk{n,Pj ,)),.... 

For n'^ and p < , the probability of ?^fc(n,Pj.) being nonempty is upper bounded 

k>5 


by 




k>5 ^ ^ k>6 

Thus, for G{n,p 2 ,p^,p^) = G{n, 0> ■ ■ ■)) 


TV 


(G(n, (pJ),G(n,P 2 ,P 3 ,pj) < Pr ^ |J ?/fc(n,pJ ^ 0 


= O 


k>5 


log^ n 
n 


We will further approximate G{n,P 2 ,Pi,P 4 ^) by G{n,P 2 ), which is the main contribution of this paper. 
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Summarizing all, since the total variation distance between G{n, m;p) and G{n,p) is upper bounded 
by the sum of TV(G(n, m;p), G(n, (pj)), TY{G{n, {pi^)),G{n,p^,p^,p^)), TY{G{n,p^,p^,p^),G{n,p^)) 
and TV(G(n, m), G(n,»)), it is enough to show that each total variation distance tends to 0. For the 

second one is described as above, we will prove that the other three total variation distances 

tend to 0 in Sections El 0] and O respectively. 

3. TOTAL VARIATION DISTANCE BETWEEN G{n,m]p) AND G{n,{p^)) 

To prove that the total variation distance between G{n,m;p) and G{n, (p^)) tends to 0, we will use 
a coupling argument. For two random variables X and Y, a coupling of X and T is a vector 

of random variables such that the marginal distributions of {X',Y') are the distributions of X and 
F, respectively. The total variation distance between X and Y is upper bounded by the probability 
of X' 7 ^ Y' for any coupling {X',Y') of X and Y. On the other hand, there always exists a coupling 
(X',Y') so that the total variation distance of X and Y is equal to the probability of X' ^ Y'. 

Lemma 3.1. [221 Chapter I, Theorem 5.2] Let X and Y he random variables. Then any coupling 
{X',Y') of X andY satisfies 

TX{X,Y) < Pr[X' Y']. 

Moreover, there exists a coupling for which the equality holds, i.e., 

TV(X,F) = Pr[X' / Y']. 

Using an appropriate coupling between a binomial random variable and a Poisson random vari¬ 
able, we will prove the following proposition, which may be applied for the case m n^logn. The 
proposition is essentially the same as Lemma 5 in [26]. We prove it for the sake of completeness. 

Proposition 3.2. Let m S> n^logn, 0 < p < and = 1 — e pY for integers 

k > 2. Then 

Tx(^G{n,m-,p),G{n,{p,))) =0(^-^y 

Proof. Let X be the number of columns of the matrix R(n, m-,p) with two or more I’s, or equivalently, 
the number of a G M with |Ui| >2. Since 

Pr[|l4| = fe]= 

for any fixed a G M, the random variable X has the binomial distribution with parameters m and 
Q2 ■= Efc>2T> i-e-, 

Pr|X = f]= Q (<,,)'(! 

The random graph G{n,m;p) may be constructed as follows: Take i.i.d. random complete graphs 
... on subsets of V, where the number of vertices in K^) is /c > 2 with probability rf./q^, 
and then, once the number is given to be k, every fe-subset of V is equally likely to be the vertex set 
of RYI . In other words, for a /c-subset ?7 of U with k > 2, the probability of U being the vertex set 
of RY) is . (As Ylk >2 ^ ~ random complete graph RY) is well-defined.) The edge set 

of G{n,m-,p) is the union of edge sets of X random complete graphs rYI ^ Ri^l. 

We now take a Poisson random variable Y with mean mq^ that is coupled with X so that Pr[A 
Y] = TV(A, Y). Let Gy be the graph whose edge set is the union of edge sets of RY\ rY^\ Then 

TV(G(n,m;p),Gy) < Pr[G(n,m;p) Gy] < Pr[A: / F] = TV(A:,F). 

On the other hand, Gy has the same distribution as G{n,{pf.)). Indeed, for each subset U of 
V with |?7| > 2, let Z([/) be the number of z = 1,2,...,F such that the vertex set of is U. 
Then, it is well-known that for k = \U\, ZfUfis are independent Poisson random variables with mean 
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mq^ • ^ = rnp^{l — p)” and hence Pr[Z(C/) > 0] = 1 — e p)" '^ = p^. Since the edge 

set of Gy is the union of edge sets of the complete graphs on U with Z{U) > 0, Gy has the same 
distribution as G{n, (p^)). 

The desired bound follows from the fact that the total variation distance between the binomial 
random variable X with parameters m, and the Poisson random variable Y with mean mq^ is not 
more than q^ [H Theorem 2.4], and 




k>2 


O(n^p^) 



log n 
m 


□ 


4. TOTAL VARIATION DISTANCE BETWEEN G'(n, Pj ,p,, P 4 ) AND G{n,p^) 


In this section, we prove that the total variation distance between G{n,p^,p^,p^) and Gin^p^) tends 
to 0. This is the main contribution of the paper. Intuitively, if there are no artifact triangles (and 
no columns with at least four I’s) with high probability, then G{n,P 2 ,Pj,p^) and G{n,P 2 ) should be 
almost the same. We will show that TV{G{n,p^,p^,p^), G(n,p 2 )) is still small enough even if there are 
few artifact triangles. As mentioned earlier, it actually turns out that the distance is small enough if 
the expected number of pairs of distinct artifact triangles with a common edge is small. When the 
expected number is not small, the total variation distance tends to 0 provided that mp^ is sufficiently 
large. Keeping this in mind, we prove the following proposition. 


Proposition 4.1. Let m ^ n^, 0 < p < ( ^ <knd p^, = 1 — e pY ^ for k >2. Then 

TV(^G(n,P2,P3,P4),G(n,P2)^ = 0{s), 

where ^ ^ 

e:=max|--,-— - —(1) 

tlogn log(m/n^)J 

For simplicity, we write G{n,p^) for G{n,p^,p^,p^). It is not difficult to check that 

Tv(G(n,P 4 ),G(n,P 2 )) = ^ (^Pr[G(n,p 2 ) = G] - min { Pr[G(n, p^) = G], Pr[G(n,p 2 ) = G]}), (2) 

Geg 

where Q is the set of all graphs on V. In order to bound the total variation distance, we consider a 
lower bound of Pr[G(n,p^) = G]. Since G{n,p^) = K{Hi{n,p^)) U K{H^{n,p^)) U G{n,p^), we may 
write Pr[G(n, p^) = G] as the sum of 


Pr 


TLiin^p^) = Q, 


'Hz{n,p^) 


T, G\{K{T)uK{Q))CG{n,p,)CG 


( 3 ) 


over all possible T and Q. Let Ti^iG) and ^^{G) be the collections of all and K 4 ’s in G that are 
regarded as collections of 3-subsets and 4-subsets of V, respectively. Then, 

Pr[G(n,P4) = G] = plQI(l - p4)(4)-IQIpf 1(1 - l-l^(Q)u^(^)l(l - p^)(2)-|G| 

Qcn^iG) 

TC«3(G) 

= Pr[G(n,j^) = G] ^ p]‘3l(l - p^)L)-\Q\p\T\(^i _ p^'^i 3 )-\T\p-\K{Q)uK{T)\^ 

QCU^iG) 

TC«3(G) 


where |G| is the number of edges in G. Let G \ K{Q) be the graph obtained from G by removing the 
edges of the graph K{Q). For each Q C taking only the case that T C 'H 3 {G \ K{Q)) yields 

that 


Pr[G(n,P4) = G] 
Pr[G(n,p2) = G] 


QC-H4(G) TGH3{G\K{Q)) 


p^P)-\T\p-\K(T)\_ ( 4 ) 









J. H. KIM, S. J. LEE, AND J. NA 


In the case that the expected number (3)^3 = Q{n^mp^) of artifact triangles is small, say p < ^^1/3 , 
one may take T, Q = 0 in the lower bound of (j3]) to obtain 

Pr[G(n,pJ = G] > Pr[G(n,/^) = G](l - (1 _ pjQ), 

and then ([2|) gives that 

Tv(G(n,pJ,G(n,ft)) < Pr[G(n,a) = G] (l - (1 - ^^( 4)(1 _ = 0(8) 

Geg 

as ( 3)^3 = Q{rfimp^) = 0{e) and ( 2)^4 = &{n^mp‘^) = 0{e). If m = n“ for a > 6 , then this holds for 
all p < since ^^1/3 > ^ ^ ) , which essentially implies the result of [T3] . 

We now assume that 

£ / 3 log n 

-^ <P< - 

For any set Q* of graphs on V, using ([2]), we have that the total variation distance is at most 

Pr[G(n,p 2 ) iQ*]+ X] (Pr[G(n,p 2 ) = G] - min { Pr[G(n, pj = G], Pr[G(n,^^) = G]}). 

Geg* 

Therefore it should be enough to consider the graphs G satisfying 

I'^ 3 (G)| ~ and |iI^ 4 (G)| « 

the exact meaning of which will be defined later. 

We first give an intuition behind the proof that will be given later. Recalling ([4]), it turns out that 



E E 4(1-»)(•)- 

Tcn3{G\K(Q)) t>0 tch3(G) 

|T|=t 

t >0 ^ ^ 

it follows that 

Similarly, 

pjQI(l-p^)(4)-l'3lp-l-f^(<9)l < 1 . 

QCHiiG) 

Therefore, the lower bound of ([3]) is close to 1 only when all the upper bounds are quite tight. In 
particular, to have the inequality (l5|) tight, we need that |iP(T')| = 3t for most collections T of t 
triangles in G for t ~ ( 3 ) 1 % unless Pj is almost 1. If t is not close to ( 3 )?^, then the summands 
are small enough to be negligible. Note that |R'(T)| = 3t means that there is no pair of triangles 
in T-L^in^p^) = T with a common edge. We consider two cases below depending upon whether the 
expected number of pairs of artifact triangles 0 (( 4 ) (™)p®) = 0 (n^m^p®) is small or not. 

We will prove the following two lemmas, from which the main proposition easily follows. Recall 

that e = max {i^, and p, = 1 - 


Since 


:3)p2-* 

t 


< 


( 3 ) 

t 


E 
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Lemma 4.2. Suppose that 


ri and 


< p < 


Then 


Lemma 4.3. Suppose that 


TV(^G(re,p 2 ,P 3 ,pJ,G(n,P 2 )^ = 0 {e). 


and 


^2/3j^l/3 


< p < 


/SlognN 1/2 


\ m 




Then 


TV 


(G{n,p^,p^,p^),G{n,p^)^ = 0 {e). 


(Ifm/n‘^ is too large, e.g., m = , then there is no suchp, so the conclusion is trivially true. On the 

other hand, if it is not too large, e.g., m = log log n, then there are p satisfying the conditions.) 

Before we prove Lemmas 14.21 and 14.31 three preliminary lemmas are introduced. 

Lemma 4.4. For n‘^ and < P < ^ 2 / 3 ^i /3 ; suppose that a graph G on V satisfies 

fi) |H 3 (G)| > (1 - 5){l)p), where 5 := \ ‘ , 

(a) the number I{G) of diamond graphs (i.e., K 4 minus one edge) in G is at most n^p^/e. 

Then the number of sets T such that T C T-LfiG), \T\ = t and \K{T)\ = 3t is at least 


( 1 - 0 ( 6 )) 




for 0 < t < tp := 


nmp 


where the constant in 0{s) is independent of G and t. 

Proof. Let XfiG) be the number of sets T such that T C FLfiG), \T\ = t and \K{T)\ = 3t. We infer 
that 


i(i-iK(G) 


\ t J 

= /!- 


> 1 - 


/^I^3(G)|\ 

(|?/3(G')|-t + 2)(|?/3(G')|-t + l)A t J 
tp{G) \ fmiG)\\ 


(I^3(G)|-4)V V i J 


Since \'H 3 {G)\ = Vt{mp 2 ) , we have that 


I^3(g)| 


= o 


S 2 fi 

nmp 

e^pI 


= O 


S “^2 6 

n nmp 

£‘^m 


= O 


( 6 ) 


where the second equality follows from^i^ = and the third equality follows fromp < ^ 273 ^ 1/3 ■ 

In particular 11 / 3 (G)| ;§> tQ, and hence 


It is easy to check from Q that 

2tp{G) _ 2/(0) 


i//3(g)|2 |/^3(g)i mm 


= o 


4 5 2 

n^p^ e 

en^p^ n 


) = 0(6) 
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and 


> 1 - 


\^ l^3(g)P 




> 1-0 


as \n3{G)\ > (1 - 6 ){fjp^. Since p, = ©(if^) and p < yield 


> (1 - 0(^))(1 - (®) p ; 


, 3 t 

'2 


{St,f = o[ 


4 4 

n mp ^ n 




e^m 


= O 


n^\i /3 1 / n 


m ) 


+ 


n ve^m 


= 0(e2), 


the desired lower bound for Xt{G) follows. 


(7) 

□ 


The same argument gives the next lemma regarding 7^4(0). 

1/2 


Lemma 4.5. For m'^ rF and 




< 


P < ^ ^ ’ suppose that a graph G on V satisfies 


17^4(0)1 >1- 


1 


en 


n 


P 2 


Then the number of Q F ^{^{G) with |( 5 | = q and \K{Q)\ = Qq is at least 

'O 


(l-O(e)) 




for 0 <q<qo := 


4 4 

n rap 


q J‘ ^ ~ e 

where the constant in 0 (e) is independent of G and t. 

Remark. The expected number of columns of the matrix i?(n, m;p) with four or more I’s is 0 (n^mp^). 
The parameter q^ is chosen to be substantially, but not extremely, bigger than the expected number 
0(n^mp^). 

Proof. Let Yq{G) be the number of Q ^ 774(0) with jQI = q and |it'(Q)| = 6(7. Observe that the 
number of pairs of 7^4 in the complete graph on V sharing at least an edge is at most ( 2 ) (2) (2) ^ 


Thus 


y,(G) > 

= d- 

>ti- 


1I«4(g)I\ _ 


1I«4(G)I\ 

q(q — l)n® 


/^|774(0)|\ 

(|774(0)|-(7 + 2)(|774(0)|-g + l)y V 7 J 

qln^ \ (\ni{G)W 


(1774(0)1-(70)27 V q )' 

Since mp^ > m ^ ^273^1/3 ^ have that p^ = 1 — o(l) and 


Therefore q^ = ^ = 0 (elog 2 n) implies that 


q'^n^ 


( 1774 ( 0)1 


= 0{s), 


and hence 


Since = O = 0 {s) and |^ = O 


i;(0) > (l-0(e))(^l^"^‘^^l^. 


I'KrlG)! 


log~_n\ _ 


= 0 (e), we have that 


CT') 4 (- piik)-™ 4 "-«•»(' - 4 
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which gives the desired iower bound for Yq{G). 


□ 


\ ^ 

Lemma 4.6. For 5 = - f + ^- 3 ^ 


, let Qj, be the set of all graphs G on V satisfying 


n 


> (1 - <5)( 3 ]p^ and /(G) < n^p^/e, 


recalling that I{G) denotes the number of diamond graphs as in Lemma \4-4\ o.nd let Q 4 be the set of 
all graphs G in G 3 satisfying 


\n,{G)\> 1 - 


1 


en 


n 


P 2 


Then for m ^ n‘^ we have 


Pr[G(n,j^) € ^ 3 ] = 1 - 0{e) for 


nm 


1/3 


and 


Pr[G(n,P 2 ) G ^ 4 ] = 1 - 0 {e) for 
where the constants in 0 {e) are independent of p. 


j^2/3^1/3 


< p < 


< p < 


/3iognN 1/2 

\ m J 

3iogn\i/2 
m 


Proof. For X 3 := |'H 3 (G(n,;jj))|, Chebyshev’s inequaiity gives that 


Pr 


X 3 < (1 - 6) 


n 


P 2 


< Pr 


|X3-.F[X3]|>J( 


n 


< = 0(.=) 




as E[X^] = ( 3 )^ 2 ^ and Var[// 3 ] = O [{n'^p^ + n^p^){l — f^)). Moreover, Markov’s inequaiity imphes 
that 


Pr 


^(G(n,P2)) > 


4 5 
n pf 


< e 


since -E [/(G(n,P2))] = (4)6 ■ p^ < n^p^. Therefore, 


VT[G{n,p,) ^ 03] < Pr X 3 < (1 - 5) ( ^ ]p^ 


n 


+ Pr 


I{G{n,p.,)) > 


4 5 
n n 


= o(e)- 


Simiiariy, for X 4 = |'H 4 (G(n,/:^))|, it is not hard to see that 

E[X 4 ] = (^P 2 Var[X 4 ] = O (n®) 

as P 2 = 1 “ 0 ( 1 ) foi' P > „ 2 / 3 ^i/ 3 ; s-nd Chebyshev’s inequaiity yieids that 

1 fn 


Pr 

Therefore, 


n^/ '^m 

1 \ ^n 


X4< 1- — .. . 

en/ V4 


< Pr 


|X,-£|X 4 ||>- 


^ e'‘r? Var|X4| ^ 


Oh 


12 


Pr[G(n,;^) ^ 04] < PT[G{n,p,) ^ 03 ] + Pr 


X4 < 1- 


\ \ (n 


en/ V4 


= 0 {e). 


□ 


Now we prove the main iemmas. 
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Proof of Lemma 14 Equality jS]) and Lemma 14.61 imply that the total variation distance between 
G(n, Pj) and G{n,pfj is at most 


Fr[G{n,p^) iQf\+ ^ (^FT[G{n,p^) = G]- min { Pr[G(n, pj = G],FT:[G{n,p^) = G]}) 
GeSa 

= 0(e)+ (^Pr[G(n,P2) = G]-min{Pr[G(n,pJ = G],Pr[G(n,P2) = G] 


GgSs 

Taking Q = 0 in Q, we have that 

Pr[G(n,P4) = G] > Pr[G(n,P2) = G](l - p4)^4,'^ ^ 

TCHsiG) 

= (1 - 0(e))Pr[G(n,j^) = G] Y Pr'(l - 

TCH3[G) 

as (2 )p4 = Q{n^mp‘^) = 0(e). For G G O 3 , Lemma 03] gives that 

TCHafG) t=0 Tcns(G) 

\T\=t,\K{T)\=3t 


> (1-0(e))^ -P3)(3) \ 


and 


Pr[G(n,P4) = G] 




Pr[G(n,P2) = G] 

Since 4 = n^mp^/e = Q(n^p^/e), Markov’s inequality yields that 

4 


^ ^ = 1-Fr Bin(^Q,pj)>4 


= l-0(e), 


where Bin(n4p0 is the binomial random variable with parameters n' and p'. Therefore, 
Pr[G(n,P4) = G] > (1 - 0(e)) Pr[G(n,a) = G] for G € ^3, 
which together with ([8]) implies that TV^G(n, p^), G(n,^)^ = 0(e), provided 


of and 


^2/3^1/3 


< p < 


3 log n\ 1/2 


m 


( 8 ) 


□ 


Proof of Lemma 




As in the proof of Lemma 14.21 it follows from dJj) and Lemma 14.61 that 


TV(^G(n,P4),G(n,p2)) 

0(£)+ Y = G']-™™{Pi'[G'(^>P4) = = G*]})- 

g&Ga 


< 


( 9 ) 










ON THE TOTAL VARIATION DISTANCE BETWEEN THE BINOMIAL RANDOM GRAPH AND THE RANDOM INTERSECTION GRA: 


Let Q C 7^4(G), and we write G\Q for G\K{Q) for brevity. For G € Q 4 , the sum in the lower bound 
of dH) restricted to the cases |T| < 4 = n^mp^/e and IQj < = n^mp^/e gives 

% 


Pr[G(n,pJ = G] 


> 


E E 


P. 


Pr[G(n,m) = Gl - ^ ^ "" 

>■'' j q=0 QC-H 4 (G) 

\Q\^<l,\K(Q)\^6q 

Lemma 14.51 and Markov’s inequality imply that 
% 


''(1 ^ ^‘(1 


t=0 TCn 3 (G\Q) 
\T\=t 


g=0 QC-H4(G) g=0 ^ ^ 

IQI=9,|K(Q)|=69 


Bin 


P 4 >Qo 


= (1 -0(e))(^l -Pr 

= 1-0(8), 

where Bin(n4p0 is a binomial random variable with parameters n' and p'. Therefore, 

>(1-0(6)) min W V 

Pr[G(n,P2) = G] ^ ocwum ^ ^ .t -2 


QCW4(G) 

IQI<9n TCH 3 (G\Q) 

^ \T\=t 


( 10 ) 


For T C 7^3 (G), let I*(T) be the number of pairs of distinct triangles in T with a common edge. 
(It is a bit different from the definition /(•) in Lemma 14.41 i For an edge e, let dT(e) be the number of 
triangles in T which contain e. Then 

3|r| - |iL(r)| = (drie) - 1) < = r{T). 

e.'.d'j' (e)>2 eidj' (e)>2 

For a fixed Q C 7i4(G) with \Q\ < q^, we will show that the number of T C 'H'i(G \ Q) with 
|F| = t < 4 = n^mp^/e and I*(T) < r := n'^mPp^/s^ is at least 


(l-0(£))( 9)pf. 


( 11 ) 


Then 




t =0 TCn 3 (G\Q) 
|T|=t 


t=0 TCH3(G\Q) 
\T\^t,I*(T)<r 


> (l-0(£))p;-^r*'3)")pi(l_pj(3) ‘ 
t=0 ^ ^ 


>(i-o(^))pr, 

where the last inequality follows from Markov’s inequality. Since p^ = (1 — > 1 — 

0 (re~^P^) and 


re 


-mp2 _ n 


e-mp = ^ . (mp2)3e-"^P =0(8), 


8^m 


we have that p^ = 1 — 0(e) and 


£ Y1 ^ 3 ( 1 -%)^"^ ^2 > 1 - 0 (e). 

t=0 TCW3(G\Q) 

|Tl=i 

This together with (|10l) and ([9]) completes the proof of Lemma 14.31 
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It remains to prove (jllj) . For t < to, we take the uniform random collection H = R{t) of triangles 
that is equally likely to be T for every T C T-Lo{G \ Q) with |r| = t. In other words, for every 
T C nsiGXQ) with |T| = t, 

Since the number of sets T C TL^[G\Q) with |T| = t containing a diamond graph is less than or equal 
to I(G)(I^3(G\Q)|^ , we have that 


E[P{R)] < I{G) 


\no{G\Q)\\(\no{G\Q)\\ 


t - 2 


-1 






< 


I{G)tl 


{m{G\Q)\-to) 


2 ’ 


where /(G) is defined in Lemma For G € Gi, since K{Q) has at most Q\Q\ < 6q^ 
0(elog^ n) edges and each edge in G is contained in at most n triangles in 7io{G), 

iRsiG \ Q)| > |/^3(G)| - 6q,n = {Rom - 0(enlog^ n) = ©(n^). 


6n‘^mp‘^ 


( 12 ) 


As 4 = 


< and /(G) < I{Kn) = 6(2) < n^ 


E[r{R)] = o 

and Markov’s inequality gives that 


I{G)tl 


rr 


= o(^]=o 




4 2 fi 

n m p 


Pr [/*(/?) > r] < 


e^E[r{R)] 

n^rrfip^ 


= 0{e). 


I^3(G\Q)|Y 


The number Z of T C RoiG \ Q) with |T| = t and /*(T) < r satisfies 

Z = il-Ois))\ 

Now we estimate Since G € ^4 and = 1 — o(l), it is obtained similarly to (fT^ that 

|i^3(G \ Q)| > mG)\- 6 q,n = (l - d - O (^)) 


and then 


(l«3(CVQ)IU(, 


> i 1 — L6 — o 


4 go 


1-0 


n- 




As in d?]), 5tg = 0(e), and it is easy to check that 


Tpm?p'^ 

IT? 


= O 


n 


log^/2 


n 


e^rn?G 


Therefore, we have that 

Z = (l-0(e)) 

This completes the proof of IfTTI) . 


= 0(e) and = 

l^3(G\g)|\ 


tg r?rrP‘p^ 


e^ 


= O 


log^ n 
e^m 


) = 0(e)- 




> (1 - 0(e)) 




□ 













ON THE TOTAL VARIATION DISTANCE BETWEEN THE BINOMIAL RANDOM GRAPH AND THE RANDOM INTERSECTION GRA: 


5. TOTAL VARIATION DISTANCE BETWEEN GlUjP^) AND G{n,p) 

For the random graphs G{n,m;p),G{n, {p^)),G{n,p^,p^,p^) and G{n,p^), we have so far considered 
the total variation distance between the consecutive pairs of them. Finally, a good upper bound for 
the total variation distance between G{n,p^) and G{n,p) easily follows from an upper bound for the 
total variation distance between two binomial distributions Bin(A^,p) and As a corollary 

of Theorem 2.2 in m, we may have 

Corollary 5.1. Let N be a positive integer, and p and q be real numbers satisfying 0 < p < q < 1. 
For 6 satisfying {q — p)N = 6-\/p{l — p)N, i.e., 6 = {q — we have 

TY(Bm{N,p),Bm{N,q)'^ <6 + 36^. 

Recalling p^ = 1 - , p = 1 - (1 - p^)'^ and e = maxjj^, we have the last 

inequality needed. 

Corollary 5.2. Suppose that and p < ^ ^ ^ . Then 

TY(^G{n,p^),G{n,p)^ =0{e). 

Proof. Let p = for 0 < c < Slogn. Since p^ = Q (if^) = © (ifs) and 

P-P^= ^ 0{nmp^e-^P^), 

we have that 


(P -ft)i 


(2) 


= O 


2 

nmp 


p,{l-p,) \e^P^^p,{l-p,) 

Therefore, Corollary I5.1I implies that 


m ) 


) = 0{e). 


TY(G{n,p,),G(n,p)) < Tv(Bin( Q’^)) = 


□ 


6. Concluding remark 

Fill, Scheinerman and Singer-Cohen [13] showed that the total variation distance between G{n, m]p) 
and G{n,p) tends to 0 for m = n°‘,a > 6. In this paper, we improve the result. Namely, the total 
variation distance still goes to 0 for m ^ n'^. If m S> then the expected number of pairs of artifact 
triangles with a common edge is small enough, or both of the two random graphs are complete graphs 
with high probability. This is the main ingredient of the proof of Theorem 11.21 

Our result naturally gives rise to the question whether the condition m ^ is tight. We initially 
believed that the total variation distance between G{n,m]p) and G{n,p) is not close to 0 if m is 
smaller than n'^. However, the more we try to prove it, the more we feel that our initial belief is 
baseless. It would not be extremely surprising even if the total variation distance tends to 0 for some 
m much less than n^. 
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